Utilizing Lexical Similarity between Related, Low-resource Languages for Pivot-based SMT

نویسندگان

  • Anoop Kunchukuttan
  • Maulik Shah
  • Pradyot Prakash
  • Pushpak Bhattacharyya
چکیده

We investigate pivot-based translation between related languages in a low resource, phrase-based SMT setting. We show that a subword-level pivot-based SMT model using a related pivot language is substantially better than word and morphemelevel pivot models. It is also highly competitive with the best direct translation model, which is encouraging as no direct source-target training corpus is used. We also show that combining multiple related language pivot models can rival a direct translation model. Thus, the use of subwords as translation units coupled with multiple related pivot languages can compensate for the lack of a direct parallel corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utilizing Lexical Similarity for pivot translation involving resource-poor, related languages

We investigate pivot-based translation between related languages in a low resource, phrase-based SMT setting. We show that a subword-level pivot-based SMT model using a related pivot language is substantially better than word and morphemelevel pivot models. It is also highly competitive with the best direct translation model, which is encouraging as no direct source-target training corpus is us...

متن کامل

Dialect Translation: Integrating Bayesian Co-segmentation Models with Pivot-based SMT

Recent research on multilingual statistical machine translation (SMT) focuses on the usage of pivot languages in order to overcome resource limitations for certain language pairs. This paper proposes a new method to translate a dialect language into a foreign language by integrating transliteration approaches based on Bayesian co-segmentation (BCS) models with pivot-based SMT approaches. The ad...

متن کامل

Statistical Machine Translation between Related Languages

Language­independent Statistical Machine Translation (SMT) has proven to be very challenging. The diversity of languages makes high accuracy difficult and requires substantial parallel corpus as well as linguistic resources (parsers, morph analyzers, etc.). An interesting observation is that a large chunk of machine translation (MT) requirements involve related languages. They are either : (i) ...

متن کامل

Local lexical adaptation in Machine Translation through triangulation: SMT helping SMT

We present a framework where auxiliary MT systems are used to provide lexical predictions to a main SMT system. In this work, predictions are obtained by means of pivoting via auxiliary languages, and introduced into the main SMT system in the form of a low order language model, which is estimated on a sentenceby-sentence basis. The linear combination of models implemented by the decoder is thu...

متن کامل

Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation

An important challenge to statistical machine translation (SMT) is the lack of parallel data for many language pairs. One common solution is to pivot through a third language for which there exist parallel corpora with the source and target languages. Although pivoting is a robust technique, it introduces some low quality translations. In this paper, we present two language-independent features...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017